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Abstract 

The cosparse analysis model has been introduced recently as an interesting alternative to the standard sparse synthesis 
approach. A prominent question brought up by this new construction is the analysis pursuit problem - the need to find 
a signal belonging to this model, given a set of corrupted measurements of it. Several pursuit methods have already 
been proposed based on relaxation and a greedy approach. In this work we pursue this question further, and propose 
a new family of pursuit algorithms for the cosparse analysis model, mimicking the greedy-like methods - compressive 
sampling matching pursuit (CoSaMP), subspace pursuit (SP), iterative hard thresholding (IHT) and hard thresholding 
pursuit (HTP). Assuming the availability of a near optimal projection scheme that finds the nearest cosparse subspace 
to any vector, we provide performance guarantees for these algorithms. Our theoretical study relies on a restricted 
isometry property adapted to the context of the cosparse analysis model. We explore empirically the performance of 
these algorithms by adopting a plain thresholding projection, demonstrating their good performance. 

Keywords: Sparse representations. Compressed sensing. Synthesis, Analysis, CoSaMP, Subspace-pursuit, Iterative 
hard threshodling. Hard thresholding pursuit. 
2010 MSC: 94A20, 94A12, 62H12 



1. Introduction 

Many natural signals and images have been observed to be inherently low dimensional despite their possibly very 
high ambient signal dimension. It is by now well understood that this phenomenon lies at the heart of the success of 
numerous methods of signal and image processing. Sparsity-based models for signals off'er an elegant and clear way 
to enforce such inherent low-dimensionality, explaining their high popularity in recent years. These models consider 
the signal x e M'' as belonging to a finite union of subspaces of dimension k d {Iji. In this paper we shall focus on 
one such approach - the cosparse analysis model - and develop pursuit methods for it. 

Before we dive into the details of the model assumed and the pursuit problem, let us first define the following 
generic inverse problem that will accompany us throughout the paper: For some unknown signal x € W', an incomplete 
set of linear observations y e (incomplete implies m < d) is available via 

y = Mx H- e, (1) 

where e e M'" is an additive bounded noise that satisfies Hell, < e^. The task is to recover or approximate x. In the 
noiseless setting where e = 0, this amounts to solving y = Mx. Of course, a simple fact in linear algebra tells us that 
this problem admits infinitely many solutions (since m < d). Therefore, when all we have is the observation y and the 
measurement/observation matrix M € M'"^"^, we are in a hopeless situation to recover x. 



'Corresponding author 



Preprint submitted to Special Issue in lAA on Sparse Approximate Solution of Linear Systems 



July II, 2012 



1.1. The Synthesis Approach 

This is where 'sparse signal models' come into play. In the sparse synthesis model, the signal x is assumed to have 
a very sparse representation in a given fixed dictionary D e R''^". In other words, there exists a with few nonzero 
entries, as counted by the "^o-norm" ||Qr||o, such that 

X = Dor, and k :- \\a\\o <sc d. (2) 

Having this knowledge we solve ^ using Xf^ = Dorfu , where 

or^ = argmin||Qr||„ subject to ||y - MDorHj < e. (3) 

a 

More details about the properties of this problem can be found in |2|,l3j]. 

Since solving (O is an NP-complete problem approximation techniques are required for recovering x. One 
strategy is by using relaxation, replacing the {q with Ci norm, resulting with the -synthesis problem 

QTf, = argminllorlli s.t. ||y - MDccHj < e. (4) 

For a unitary matrix D and a vector x with A:-sparse representation a, if 62k < ^Ci then 

||xf, - xIIj < Q, llelb , (5) 

where Xf , = Dorf ^ , 62k is the constant of the restricted isometry property (RIP) of MD for 2k sparse signals, Cf , is 
a constant greater than y/l and 6[, 0.4652) is a reference constant Note that this result implies a perfect 

recovery in the absence of noise. The above statement was extended also for incoherent redundant dictionaries |7|. 

Another option for approximating Q is using a greedy strategy, like in the thresholding technique or orthogonal 
matching pursuit (OMP) ISllst]. A different related approach is the greedy-like familyof algorithms. Among those 
we have compressed sensing matching pursuit (CoSaMP) ifioll . subspace pursuit (SP) [ij], iterative hard thresholding 



(IHT) III2II and hard thresholding pursuit (HTP) 11 1311 . CoSaMP and SP were the first greedy methods shown to have 



guarantees in the form of (|5]l assuming 64k < (5c„s,*ip and < Ssf ifiol [TH l6l [l4ll . Following their work, iterative 
hard thresholding (IHT) and hard thresholding pursuit (HTP) were shown to have similar guarantees under similar 
conditions 1 12, 13, 15, a]. 



1.2. The Cosparse Analysis Model 

Recently, a new signal model called cosparse analysis model was proposed in lfl6l[l7ll . The model can be summa- 
rized as follows: For a fixed analysis operator e W^'' referred to as the analysis dictionary, a signal x e M'' belongs 
to the cosparse analysis model with cosparsity ( if 

t:^p- ||0x||g . (6) 

The quantity ( is the number of rows in that are orthogonal to the signal. The signal x is said to be /"-cosparse, or 
simply cosparse. We denote the indices of the zeros of the analysis representation as the cosupport A. As the definition 
of cosparsity suggests, the emphasis of the cosparse analysis model is on the zeros of the analysis representation vector 
0x. This contrasts the emphasis on 'few non-zeros' in the synthesis model (|2|. It is clear that in the case that every 
€ rows in are independent, x resides in a subspace of dimension d - t. In the general case where dependencies 
occur between the rows of 0, the dimension is d minus the rank of the corresponding sub-matrix 0a that contains the 
rows from that belong to A. This is similar to the behavior in the synthesis case where a A:-sparse signal lives in a 
A:-dimensional space. Thus, for this model to be effective, we assume a large value of {. 

In the analysis model, recovering x from the corrupted measurements is done by solving the following minimiza- 
tion problem [II8II : 

iLA-t„ = argmin||0x||y subject to ||y - MxHj < e. (7) 

X 

Solving this problem is NP-complete iflill . iust as in the synthesis case, and thus approximation methods are required. 
As before, we can use an {\ relaxation to ©, replacing the A) with £1 in (|7|i, resulting with the ^i-analysis problem 
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Another option is the greedy approach. A greedy algorithm called Greedy Analysis Pursuit (GAP) 
has been developed in llm llTl Ell] that somehow mimics Orthogonal Matching Pursuit jsl |9|] with a form of iterative 
reweighted least Squares (IRLSI i22ll. Other alternatives for OMP, backward greedy (BG) and orthogonal BG (OBG), 



were presented inj 
was analyzed in 124 



for the case that M is the identity. For the same case, the parallel to the thresholding technique 



1.3. This Work 

Another avenue exists for the development of analysis pursuit algorithms - constructing methods that will imitate 
the family of greedy-like algorithms. Indeed, we have recently presented preliminary and simplified versions of 
analysis IHT (AIHT), analysis HTP (AHTP), analysis CoSaMP (ACoSaMP) and Analysis SP ASP) in [25, 26J as 
analysis versions of the synthesis counterpart methods. This paper re-introduces these algorithms in a more general 
form, ties them to their synthesis origins, and analyze their expected performance. The main contribution of the paper 
is our result on the stability of these analysis pursuit algorithms. We show that after a finite number of iterations and 
for a given constant cq, the reconstruction result x of AIHT, AHTP, ACoSaMP and ASP all satisfy 

l|x-x||2<co||e||2, (8) 

under an RIP-like condition on M and the assumption that we are given a good near optimal projection scheme. A 
bound is also given for the case where x is only nearly ^-cosparse. Similar results for the £\ analysis appear in ifloiEoll . 
More details about the relation between these papers and our results will be given in Section |4] In addition to our 
theoretical results we demonstrate the performance of the four pursuit methods under a thresholding based simple 
projection scheme. Both our theoretical and empirical results show that linear dependencies in that result with a 
larger cosparsity in the signal x, lead to a better reconstruction performance. This suggests that, as opposed to the 
synthesis case, strong linear dependencies within are desired. 
This paper is organized as follows: 

• In Section |2]we define the notion of near optimal projection. Similarly, we also define an RIP-like property for 
the analysis model. Both are used throughout this paper as a main force for deriving our theoretical results. 

• In Section|3]the four pursuit algorithms for the cosparse analysis framework are defined, adapted to the general 
format of the pursuit problem we have defined above. 

• In Section |4] we derive the success guarantees for all the above algorithms in a unified way. Note that the pro- 
vided results can be easily adapted to other union-of-subspaces models given near optimal projection schemes 



for them, in the same fashion done for IHT with an optimal projection scheme in 112711 . The relation between the 
obtained results and existing work appears in this section as well. 

Empirical performance of these algorithms is demonstrated in Section |5] in the context of the cosparse signal 
recovery problem. We use a simple thresholding as the near optimal projection scheme in the greedy-like 
techniques. 

Section|6]discuss the presented results and concludes our work. 



2. Notations and Preliminaries 

2.1. General Definitions 

We use the following notation in our work: 

• ctm is the largest singular value of M, i.e., cr^ - ||M*M||2. 

• IMI2 is the euclidian norm for vectors and the spectral norm for matrices. is the {\ norm that sums the 
absolute values of a vector and ||-||(), though not really a norm, is the A)-norm which counts the number of 
non-zero elements in a vector. 
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• Given a cosupport set A, £2a is a sub-matrix of il with the rows that belong to A. 

• For given vectors v, z e M'' and an analysis dictionary 0, cosupp(0v) returns the cosupport of 0v and cosupp(0z, < 
returns the set of {! smallest elements in 0z. If more than £ elements are zero all of them are returned. 

• I'ie preserves the smallest i elements in a vector and zeros the rest. 

• In a similar way, in the synthesis case D7 is a sub-matrix of D with columns corresponding to the set of indices 
T, supp(-) returns the support of a vector, supp(-,^) returns the set of A:-largest elements and [-l^t preserves the 
A;-largest elements in a vector. 

• Qa = I - 0]^0A is the orthogonal projection onto the orthogonal complement of range(£l^). 

• Pa = I - Qa = 0j^0A is the orthogonal projection onto range(n]^). 

• Xaiiit/Xaiitp/Xacosi,mp/Xasp are the reconstruction results of AIHT/ AHTP/ ACoSaMP/ ASP respectively. Sometimes 
when it is clear from the context to which algorithms we refer, we abuse notations and use x to denote the 
reconstruction result. 

• A vector v has a corank r if JIaV - and rank(nA) - r. 

• [p\ denotes the set of integers [1 .../?]. 

• Lfi/ = {A c [p], |A| > () is the set of ^-cosparse cosupports and L^™k - [p],mnk{'£l^) > r) is the set of 
all cosupports with corresponding corank r. 

• ^A - span-'"(0A) = {QaZ, z g K'') is the subspace spanned by a cosparsity set A. 

• ^ii,t - UAeLn, ^A is the union of subspaces of /"-cosparse vectors and yi™'™'' = U^gj^conmk 'W^a is the union 
of subspaces of all vectors with corank r. In the case that every C rows of il are independent it is clear that 
Jii - When it will be clear from the context, we will remove Q from the subscript. 

• X e M^' denotes the original unknown ^'-cosparse vector and A^ its cosupport. 

• V, u e J[[ are used to denote general ^'-cosparse vectors and z e M'' is used to denote a general vector. 

2.2. 0-RIP Definition and its Properties 

We now turn to define the 0-RIP, which parallels the regular RIP as used in jst). 

Definition 2.1. A matrix M has the il-RIP property with constants 5^ and 6™''""'', if 6c and S™™'' are the smallest 
constants that satisfy 

(l-6i)\M\l<\\My\\l<(l+6c)\M\l (9) 
(1 - 5™™"*^) ||u||2 < \\Mu\\l <(!+ (5™'""'^) ||u||2 (10) 

for every y e J[c and u e 

The 0-RIP, like the regular RIP, inherits several key properties, the first of which is an immediate corollary. 
Corollary 2.2. IfM satisfies the £l-RIP with constants 6[ and (Jj;"™"* then 

IIMQaII? < 1 + 5f (11) 

IImQaII^ < 1 + 5^™"* 

for any AeLcandAe L™™«^ 



'Though Sf is also a function of we abuse notation and use the same symbol for the 0-RIP as the regular RIP. It will be clear from the context 
to which of them we refer and what is in use with the 0-RIP. 
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Proof: Any v e J[c can be represented as v = QaZ with A e Lf and z e R''. Thus, the 0-RIP in (|9]l can be reformulated 
as 

(1 - 5d IIQazII^ < ||MQaz||2 < (1 + 6e) ||Qaz||2 (12) 

for any z e R'' and A e Lf. Since Qa is a projection ||Qaz||2 < HzHj. Combining this with the right inequality in ( fT2l l 

gives 

l|MQAz||^<(l+5f)||z||2 (13) 

for any z e R'' and A e Lf . The first inequality in (fTTT i follows from (fTST l by the definition of the spectral norm. For 
the second inequality in (fTTT i the proof is identical replacing Lf with L™""''. □ 

Lemma 2.3. For l<i and f < r it holds that S( < 6i and 6'°'™^ < 5™™"*. 

Proof: Since Jit c Jii and c the claim is immediate. □ 

Lemma 2.4. M satisfies the il-RIP if and only if 

||QA(I-M-M)QA|l2<5f (14) 
||Qa(I - M*M)Qa||2 < C""*^ 

for any A € Lf and A e L™™"*-'. 

Proof: The proof is similar to the one of the regular RIP as appears in j^. We present only the proof for 6f since the 
proof for Sf"^^"^ is almost identical. As a first step we observe that Definition l2.1l is equivalent to requiring 

|||Mv||2-||v|i|<5f||v||2 (15) 

for any v € ^[f . The last is equivalent to 

|l|MQAZ||2-||QAZ||^|<5f||QAZ||2 (16) 

for any set A € Lf and any z € R'', since QaZ € y[f . Next we notice that 

I|MQaz||2 - ||Qaz||2 = z'QaM*MQaZ - z*QaZ = (Qa(M*M - I)QaZ,z). 

Since Qa(M*M - I)Qa is Hermitian we have that 

KQa(M-M-I)QaZ,z)| 

max — = IIQa(M M - I)QaII2 • (17) 

^ INI2 

Thus we have that Definition l2.1l is equivalent to (fT4l i for any set A € Lf . □ 
Corollary 2.5. IfM satisfies the ^-RIP then 

||QA,(I-M*M)QA,||2<(5f, (18) 

||QA,(I-M*M)QAj|2<5r''"' 
for any Ai and A2 such that Ai fl A2 e Lf and any Ai and A2 such that Ai fi A2 e . 
Proof: Since Ai n A2 c Ai and Ai n A2 c A2 

||Qa,(I - M*M)Qa,||2 < llQA^nA.a - M*M)QA,nAj|2 ■ 
The same argument holds also for (5™™''. Using Lemma l2!4l completes the proof. □ 
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2.3. Near Optimal Projection 

Given a general vector z G Mf', we would like to find an ^-cosparse vector that is closest to it in the ^2-norm sense. 
In other words, we would like to project the vector to the closest /"-cosparse subspace. Given the cosupport A of this 
space the solution is simply QaZ. Thus, the problem of finding the closest /"-cosparse vector turns to be the problem 
of finding the cosupport of the closest f-cosparse subspace. We denote the procedure of finding this cosupport by 

5;(z) = argmin||z-QAz||^. (19) 

AELf 

In the representation domain in the synthesis case, the support of the closest A:-sparse subspace is found simply by 
hard thresholding, i.e., taking the support of the fc-largest elements. However, in the analysis case calculating ( fT9] l 
seems to be combinatorial with no efficient method for doing it. Thus an approximation procedure Sc is needed. For 



this purpose we introduce the definition of a near-optimal projection II25I1 . 



Definition 2.6. A procedure Sf implies a near-optimal projection Q^^^.j with a constant C( if for any z e R'' 

||z-Qj^(^,z||5<Q||z-Q^;(,)z|g. (20) 



A clear implication of this definition is that if S( implies a near-optimal projection with a constant C[ then for any 
vector z e M'' and an ^-cosparse vector v e M'' 

||z-Q^^(,,z||^<Q||z-v||2. (21) 

Similarly to the 0-RIP, the above discussion can be directed also for finding the closest vector with corank r 
defining ^S^'^nk* j^g^. optimal projection for this case in a very similar way to ( fT9l ) and Definition l2.6l respectivelv. 

2.4. Problem Definition 

With the above notations and definitions we restate the problem we aim at solving. 

Definition 2.7 (Problem P). Consider a measurement vector y e K'" such that y - Mx + e where x 6 R'' is l- 
cosparse, M e R"'^'' is a degradation operator and e e R"' is a hounded additive noise. The largest singular value of 
M is ctm and its 0-RIP constant is 6c. A procedure Se for finding a cosupport that implies a near optimal projection 
with a constant Cc is assumed to be at hand. Our task is to recover x from y. The recovery result is denoted by x. 



3. New Analysis algorithms 

3.1. Quick Review of the Greedy-Like Methods 

Before we turn to present the analysis versions of the greedy-like techniques we recall their synthesis versions. 
These use a prior knowledge about the cardinality k and actually aim at approximating a variant of (O 

argminlly - MDallj subject to ||Qr||o < A:. (22) 

or 

For simplicity we shall present the greedy-like pursuits for the case D = I. In the general case M should be replaced 
with MD, X with or and the reconstruction result should be x = Dor. In addition, in the algorithms' description we do 
not specify the stopping criterion. Any standard stopping criterion, like residual's size or relative iteration change, can 
be used. More details can be found in ll 



IHT and HTP: IHT il2fl and HTP 11311 are presented in Algorithm[T] Each IHT iteration is composed of two basic 
steps. The first is a gradient step, with a step size in the direction of minimizing ||y - Mx||t. The step size can be 
either constant in all iterations {jJ - p) or changing 128?]. The result vector Xg is not guaranteed to be sparse and thus 
the second step of IHT projects x^ to the closest A;-sparse subspace by keeping its largest k elements. The HTP takes 
a different strategy in the projection step. Instead of using a simple projection to the closest ^-sparse subspace, HTP 



selects the vector in this subspace that minimizes ||y - MxHj 11131 12911 . 
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Algorithm 1 Iterative hard thresholding (IHT) and hard thresholding pursuit (HTP) 
Require: k, M, y where y = Mx + e, kis the cardinality of x and e is an additive noise. 
Ensure: X]Hx orXiiTp: /c-sparse approximation of x. 

Initialize representation x" = and set r = 0. 

while halting criterion is not satisfied do 
t - t +1. 

Perform a gradient step: Xg = x'"' + yu'M*(y - Mx'"') 
Find a new support: T' - supp(Xg, k) 

Calculate a new representation: x[„^ — (Xg)^/ for IHT, and x[^^p - M^y for HTP. 
end while 

Form the final solution x,„t = x' for IHT and Xhtp = x' „ for HTP. 



Algorithm 2 Subspace Pursuit (SP) and CoSaMP 

Require: k, M, y where y = Mx + e, kis the cardinality of x and e is an additive noise, a - I (SP), a = 2 (CoSaMP). 
Ensure: Xr osaMP or Xsp: /c-sparse approximation of x. 

Initialize the support T*' = 0, the residual y[.gj,y = y and set f = 0. 

while halting criterion is not satisfied do 
/ = /+ I. 

Find new support elements: Ta = supp(M*y^^Jj^, ak). 
Update the support: f = T'"' U Ta. 
Compute a temporary representation: x^ — M^^y. 
Prune small entries: T' = supp(Xp,A:). 

Calculate a new representation: x^^^^^p — {Xp)r for CoSaMP, and x^p - M^y for SP. 
Update the residual: y|.^^j^ = y - MxJ,^3,„p for CoSaMP, and y[^^j^ = y - Mx!,p for SP. 
end while 

Form the final solution Xc„sump - '^'cosmp for CoSaMP and Xsp - x^p for SP. 



CoSaMP and SP: CoSaMP Holl and SP lull are presented in Algorithm |2l The difference between these two 



techniques is similar to the difference between IHT and HTP. Unlike IHT and HTP, the estimate for the support of x 
in each CoSaMP and SP iteration is computed by observing the residual y'gj,y = y - Mx'. In each iteration, CoSaMP 
and SP extract new support indices from the residual by taking the indices of the largest elements in M*y|.^^.j. They 
add the new indices to the estimated support set from the previous iteration creating a new estimated support T' with 
cardinality larger than k. Having the updated support, in a similar way to the projection in HTP, an objective aware 
projection is performed resulting with an estimate x,, for x that is supported on T'. Since we know that x is A:-sparse 
we want to project Xp to a A;-sparse subspace. CoSaMP does it by simple hard thresholding like in IHT. SP does it by 
an objective aware projection similar to HTP. 

3.2. Analysis greedy-like methods 

Given the synthesis greedy-like pursuits, we would like to define their analysis counterparts. For this task we need 
to 'translate' each synthesis operation into an analysis one. This gives us a general recipe for converting algorithms 
between the two schemes. The parallel lines between the schemes are presented in Table [1] Those become more 
intuitive and clear when we keep in mind that while the synthesis approach focuses on the non-zeros, the analysis 
concentrates on the zeros. 

For clarity we dwell a bit more on the equivalences. For the cosupport selection, as mentioned in Section |2] 
computing the optimal cosupport is a combinatorial problem and thus the approximation S[ is used. One intuitive 
option for it is the simple thresholding 

Sf{z) = cosupp(0z, €), (23) 

which selects as a cosupport the indices of the ^-smallest elements after applying on z. Though similar to the hard 
thresholding used in synthesis which yields the optimal support, in analysis it is not guaranteed to find the optimal 
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Synthesis operation 
name 


Synthesis operation 


Analysis operation name 


Analysis operation 


Support selection 


Largest k elements: 
T - supp(-, k) 


Cosupport selection 


Using a near optimal 
projection: A - Si{-) 


Orthogonal Projection of 
z to a ^-sparse subspace 
with support T 


Zt 


Orthogonal projection of z 
to an ^-cosparse subspace 
with cosupport A 


QaZ 


Objective aware 
projection to a fc-sparse 
subspace with support T 


M^y = 

argmin^ ||y - Mv||2 s.t. 

Vj-c = 


Objective aware projection 
to an ^-cosparse subspace 
with cosupport A 


argmin^ ||y - MvHj s.t. 
0av = O 


Support of Vi + V2 where 
supp(vi) = Ti and 
supp(v2) = T2 


supp(vi + V2) c Ti U r2 


Cosupport of Vi + V2 where 
cosupp(vi) = Ai and 
cosupp(v2) = A2 


cosupp(vi + V2) 3 
Ai n A2 


Maximal size of T\ U T2 
where \T\\<k\ and 

\T2\ < k2 


iri U 7^21 <ki+k2 


Minimal size of Ai n A2 
where |Ai| > and 

IA2I > i2 


|Ai n A2I > ^1 + {2 - p 



Table 1 : Parallel synthesis and analysis operations 



cosupport. Having a selected cosupport A, the projection to its corresponding cosparse subspace becomes trivial, 
given by Qa. 

Given two vectors Vi e yif, and V2 e ^If, such that Ai = cosupp(0Vi) and A2 - cosupp(0V2), we know that |Ai | > 
(i and|A2| > £2- Denoting Ti - supp(0Vi) and r2 - supp(0V2) it is clear that supp(0(vi+vi)) c T\yjT2- Noticing that 
supp(-) = cosupp(-)'^itisclearthat|ri| < p-Eu \T2\ < /7-^2 and cosupp(0(vi+V2)) 3 (riUr2)'^ = rfnrf = AinA2. 
From the last equality we can also deduce that |Ai n A2I - p — \ T \ D T2\ > p - {p — ii) — ip — {2) - {\ + ti - P- 

With the above observations we can develop the analysis versions of the greedy-like algorithms. As in the synthesis 
case, we do not specify a stopping criterion. Any stopping criterion used for the synthesis versions can be used also 
for the analysis ones. 



Algorithm 3 Analysis Iterative hard thresholding (IHT) and analysis hard thresholding pursuit (HTP) 
Require: t, M, J2, y where y - Mx + e, ^ is the cosparsity of x under Q and e is the additive noise. 
Ensure: Xaiht or Xahtp- ^-cosparse approximation of x. 

Initialize estimate x'^ = and set t -Q. 

while halting criterion is not satisfied do 
t -t +\. 

Perform a gradient step: Xg = x'"' + jJWiy - Mx'~') 
Find a new cosupport: A' = tSf(Xg) 

Calculate a new estimate: x^j^^ — Qa'Xj, for AIHT, and x^^^p = argmin^ ||y - Mv||5 s.t. 0aV = for AHTP 
end while 

Form the final solution x.,„t = x'.,„^ for AIHT and x.htp — x'.„xo for AHTP. 



Aim and AHTP: Analysis IHT (AIHT) and analysis HTP (AHTP) are presented in Algorithm |3] As in the 
synthesis case, the choice of the gradient stepsize ju' is crucial: If yu"s are chosen too small, the algorithm gets stuck 
at a wrong solution and if too large, the algorithm diverges. We consider two options for . 

In the first we choose fi' = p for some constant // for all iterations. A theoretical discussion on how to choose yU 
properly is given in Section|4T| 

The second option is to select a different fi in each iteration. One way for doing it is to choose an 'optimal' stepsize 
p' by solving the following problem 

ju' := argmin ||y - Mx'H^ . (24) 
/J 
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Algorithm 4 Analysis Subspace Pursuit (ASP) and Analysis CoSaMP (ACoSaMP) 



Require: M, Q, y, a where y — Mx + e, ^ is the cosparsity of x under il and e is the additive noise. 
Ensure: 

^acosjmp or Xasp^ /'-cosparse approximation of x. 
Initialize the cosupport A" = {/, 1 < ; < p], the residual y"gj,y - y and set f = 0. 
while halting criterion is not satisfied do 
t-t+l. 

Find new cosupport elements: Aa - Sac(^*y'^l^^- 
Update the cosupport: A' = A'"' n A^. 

Compute a temporary estimate: x^ — argmin^ ||y - MxHj s.t. Qy^,x = 0. 
Enlarge the cosupport: A' - Sf{Xp). 

Calculate a new estimate: x'^cosm? - Qa'X;j for ACoSaMP, and xj^^p = argmin^ ||y - Mx||5 s.t. Qa^x = for ASP 
Update the residual: y^^^;^ = y - Mx[,^,,^, for ACoSaMP, and y;:^^^ = y - Mx^.^ for ASP 
end while 

Form the final solution Xacoshmp = x^p^j,^p for ACoSaMP and x^sp = x^j,p for ASP. 



Since A' - .Sf(x'"' + //'M*(y - Mx'"')) and x' - Qa<(x^), the above requires a line search over different values of 
fi and along the search A' might change several times. A simpler way is an adaptive step size selection as proposed 
in ll28ll for IHT. In a heuristical way we limit the search to the cosupport A = >S;(M*(y - Mx'"')) n A'"'. This is 
the intersection of the cosupport of x'"' with the ^-cosparse cosupport of the estimated closest f-cosparse subspace to 
M*(y - Mx'"'). Since x'"' = Qax'"', finding /j turns to be 

^l' := argmin ||y - M(x'-' + A^QAM*(y - Mx'-'))]]' , (25) 

This procedure of selecting ju' does not require a line search and it has a simple closed form solution. 
To summarize, there are three main options for the step size selection: 

• Constant step-size selection - uses a constant step size ju' = ju in all iterations. 

• Optimal changing step-size selection - uses different values for fi' in each iterations by minimizing M(x - x'). 

• Adaptive changing step-size selection - uses dZSl ). 

ACoSaMP and ASP: analysis CoSaMP (ACoSaMP) and analysis SP (ASP) are presented in Algorithm g] The 
stages are parallel to those of the synthesis CoSaMP and SP. We dwell a bit more on the meaning of the parameter a 
in the algorithms. This parameter determines the size of the new cosupport Aa in each iteration, a - I means that 
the size is t and according to Table [1] it is equivalent to a = 1 in the synthesis as done in SP in which we select new 
k indices for the support in each iteration. In synthesis CoSaMP we use a - 2 and select 2k new elements. 2k is the 
maximal support size of two added ^-sparse vectors. The corresponding size in the analysis case is2{ - p according 
to Table [U For this setting we need to choose a - ^7-^. 

3.3. The Unitary Case 

For il = I the synthesis and the analysis greedy-like algorithms become equivalent. This is easy to see since in 
this case we have p - d,k - d - {, K - T'^ , QaX - Xj- and T\\JT2 - K\ C\ A2 for Ai = T'^ and A2 = T^. In addition, 
S( = finds the closest f-cosparse subspace by simply taking the smallest { elements. Using similar arguments, also 
in the case where is a unitary matrix the analysis methods coincide with the synthesis ones. In order to get exactly 
the same algorithms M is replaced with M0* in the synthesis techniques and the output is multiplied by 0*. 

Based on this observation, we can deduce that the guarantees of the synthesis greedy-like methods apply also for 
the analysis ones in a trivial way. Thus, it is tempting to assume that the last should have similar guarantees based on 
the 0-RIP. In the next section we develop such claims. 

Before moving to the next section we mention a variation of the analysis greedy-like techniques. In AHTP, 
ACoSaMP and ASP we need to solve the constrained optimization problem ||y - MxHj s.t. ||0ax||2 = 0. For high 
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dimensional signals this problem is hard to solve and we suggest to replace it with minimizing ||y - Mx\\f + A ||0ax||2, 
where A is a relaxation constant. This results in a relaxed version of the algorithms. We refer hereafter to these 
versions as relaxed AHTP (RAHTP) relaxed ASP (RASP) and relaxed ACoSaMP (RACoSaMP). 



4. Algorithms Guarantees 

In this section we provide theoretical guarantees for the reconstruction performance of the analysis greedy-like 
methods. For AlHT and AHTP we study both the constant step-size and the optimal step-size selections. For 
ACoSaMP and ASP the analysis is made for a - ^y^, but we believe that it can be extended also to other values 
of a, such as fl = 1. The performance guarantees we provide are summarized in the following two theorems. The first 
theorem, for AlHT and AHTP, is a simplified version of Theorem l4.4l and the second theorem, for ASP and ACoSaMP, 
is a combination of Corollaries l4.8l and l4.13l all of which appear hereafter along with their proofs. 

Theorem 4.1 (Stable Recovery of AlHT and AHTP). Consider the problem V and apply either AlHT or AHTP 
with a certain constant step-size or an optimal changing step-size, obtaining x' after t iterations. If 



< 1 (26) 



and d2(-p < 6i{Cc, o"^), where 6i{C(, cr^) is a constant guaranteed to be greater than zero whenever ( I26l l is satisfied, 
then after a finite number of iterations t* 
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<ci||e||2, (27) 



implying that these algorithms lead to a stable recovery. The constant c\ is a function of 52t-p, C{ and crL, and the 



constant step-size used is dependent on Si{Cc, crh). 



2t-p, ana cTj^, 



Theorem 4.2 (Stable Recovery of ASP and ACoSaMP). Consider the problem P and apply either ACoSaMP or 

-:t-i 
( 



ASP with a = obtaining x' after t iterations. If 



(l+C^)(l-(Cj-(C^-l)fr^))< 1, (28) 

and 

6a(-t,p < 62(Cg,cr\y), 

where C^ — max(Cf, C2e-p) and62{C^, cr^) is a constant guaranteed to be greater than zero whenever (128b is satisfied, 
then after a finite number of iterations t* 

||x-x''||, <c2||e||2. (29) 

implying that these algorithms lead to a stable recovery. The constant C2 is a function of SAc-'ip, Cc, C2C-p and cr^. 

Before we proceed to the proofs, let us comment on the constants in the above theorems. Their values can be 
calculated using Theorem 14.41 and Corollaries 14.81 and 14. 131 In the case where is a unitary matrix, ( l26b and ( l28l l 
are trivially satisfied since Q = C2c-p - 1. In this case the 0-RlP conditions become 52[-p < 6\{\,cr^ - 1/3 for 
AlHT and AHTR and d^t-ip < ^2(1, cr^) = 0.0156 for ACoSaMP and ASP In terms of synthesis RIP for M0*, the 
condition 62c-p < 1/3 parallels 52i(M0*) < 1/3 and similarly 6Ac--ip < 0.0156 parallels 54a(M0*) < 0.0156. Note 
that the condition we pose for AlHT and AHTP in this case is the same as the one presented for synthesis IHT with a 
constant step size 1 15i l . 

In the non-unitary case, the value of ctm plays a vital role, though we believe that this is just an artifact of our proof 
technique. For a random Gaussian matrix whose entries are i.i.d with a zero-mean and a variance ctm behaves 

like 1 + This is true also for other types of distributions for which the fourth moment is known to be bounded 
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For example, for dim - 1.5 we have found empirically that cr^ ^ 5. In this case we need Cc < j for (|26] | to 
hold and < 1.118 for (l28l l to hold, and both are quite demanding on the quality of the near-optimal projection. For 
Q = = 1.05 we have the conditions 62c-p < 0.289 for AlHT and AHTP, and 6Ac-ip < 0.0049 for ACoSaMP and 
ASP; and for Cf = = 1.1 we have dit-p < 0.24 for AlHT and AHTP and d^c-ip < 0.00032 for ACoSaMP and 
ASP 

As in the synthesis case, the 0-RlP requirements for the theoretical bounds of AlHT and AHTP are better than 
those for ACoSaMP and ASP. In addition, in the migration from the synthesis to the analysis we lost more precision 
in the bounds for ACoSaMP and ASP than in those of AlHT and AHTP. In particular, even in the case where is 
the identity we do not coincide with any of the synthesis parallel RIP reference constants. We should also remember 



that the synthesis bound for SP is in terms of 6j,k and not 6/[k Hi lH . Thus, we expect that it will be possible to give 
a condition for ASP in terms of S-^c-ip with better reference constants. However, our main interest in this work is to 
show the existence of such bounds, and in Section l43] we dwell more on their meaning. 

We should note that here and elsewhere we can replace the conditions on S2e-p and (54f_3,, in the theorems to 
conditions on ^2°™'' '^4°™; proofs will be almost the same^. In this case we will be analyzing a version 

of the algorithms which is driven by the corank instead of the cosparsity. This would mean we need the near-optimal 
projection to be in terms of the corank. In the case where ii is in a general position, there is no difference between 
the cosparsity £ and the corank r. However, when we have linear dependencies in the two measures differ and an 
^-cosparse vector is not necessarily a vector with a corank r. 

As we will see hereafter, our recovery conditions require 62c-p and Suc-'ip to be as small as possible and for this 
we need 2{ - p and 4/' - 3/7 to be as large as possible. Thus, we need ( to be as close as possible to p and for 
highly redundant this cannot be achieved without having linear dependencies in 0. Apart from the theoretical 
advantage of linear dependencies in 0, we also show empirically that an analysis dictionary with linear dependencies 
has better recovery rate than analysis dictionary in a general position of the same dimension. Thus, we deduce that 
linear dependencies in lead to better bounds and restoration performance. 

Though linear dependencies allow ( to be larger than d and be in the order of p, the value of the corank is always 
bounded by d and cannot be expected to be large enough for highly redundant analysis dictionaries. In addition, 
we will see hereafter that the number of measurements m required by the 0-RlP is strongly dependent on £ and 
less effected by the value of r. From the computational point of view we note also that using corank requires its 
computation in each iteration which increases the overall complexity of the algorithms. Thus, it is more reasonable to 
have conditions on 62t-p and Suc-'ip than on (Jj"™*^ and (54°™^, and our study will be focused on the cosparsity based 
algorithms. 

4.1. AlHT and AHTP Guarantees 

A uniform guarantee for AlHT in the case that an optimal projection is given, is presented in ll27[l . The work in 127 1 
dealt with a general union of subspaces, and assumed that M is bi-Lipschitz on the considered union of subspaces. 
In our case - Jii and the bi-Lipschitz constants of M are the largest Bj^ and smallest Bjj where < Bl < Bu such 
that for aU ^-cosparse vectors \\,\2- 

Bl llvi + V2||^ < ||M(vi + V2)||^ < Bu ||vi + V2|li . (30) 



Under this assumption, one can apply Theorem 2 from 112711 to the idealized AlHT that has access to an optimal 
projection and uses a constant step size p' - p. Relying on Table [1] we present this theorem and replace Bi and Bu 
with 1 - 621-p and 1 + 62i-p respectively. 



Theorem 4.3 (Theorem 2 in Il27ll ). Consider the problem V with C{ - 1 and apply AlHT with a constant step size p. 



Ifl+ 62C-P < ^ < 1.5(1 — 62e-p) then after a finite number of iterations t* 

||x-x''||2<c3||e||2, (31) 
implying that AlHT leads to a stable recovery. The constant cj, is a function of62C-p and p. 



^At a first glance one would think that the conditions should be in terms of ,5™™'' and iSJj"™*- However, given two cosparse vectors with coranks 
ri and ri the best estimation we can have for the corank of their sum is )-| + r2 - p. 
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In this work we extend the above in several ways: First, we refer to the case where optimal projection is not 
known, and show that the same flavor guarantees apply for a near-optimal projectior(|. The price we seemingly have 
to pay is that ctm enters the game. Second, we derive similar results for the AHTP method. Finally, we also consider 
the optimal step size and show that the same performance guarantees hold true in that case. 

Theorem 4.4. Consider the problem V and apply either AIHT or AHTP with a constant step size fi or an optimal 
changing step size. For a positive constant rj > 0, let 



and b2 



1 + r] Ct{\ - 62e-p) 

Suppose I = cHi-Sl) < 1' 1 + ^^'-P ^ ^ < (l + - ^2t-p) and i < tr^. Then for 

^\ Mi I 



t > t 



IN-^f^r-T^II^II- (32) 

1 - d2(-p 

implying that AIHT and AHTP lead to a stable recovery. Note that for an optimal changing step-size we set p — 



in t* and the conditions of the theorem are simply ^ < 1 and 1 + 62(-p < (1 + ^1 - p)b\{l — 62(-p). 



1+S21 



This theorem is the parallel to Theorems 2.1 in lllSIl for IHT. A few remarks are in order for the nature of the 
theorem, especially in regards to the constant rj. One can view that 77 gives a trade-off between satisfying the theorem 
conditions and the amplification of the noise. In particular, one may consider that the above theorem proves the 
convergence result for the noiseless case by taking 77 to infinity; one can imagine solving the problem V where e — > 0, 
and applying the theorem with appropriately chosen rj which approaches infinity. It is indeed possible to show that 
the iterate solutions of AIHT and AHTP converges to x when there is no noise. However, we will not give a separate 
proof since the basic idea of the arguments is the same for both cases. 



We will prove the theorem by proving two key lemmas first. The proof technique is based on ideas from 115 1 
and Il27[l . Recall that the two iterative algorithms try to reduce the objective ||y - Mx'y^ over iterations t. Thus, 

II 1|2 

the progress of the algorithms can be indirectly measured by how much the objective y - Mx' is reduced at each 



iteration t. The two lemmas that we present capture this idea. The first lemma is similar to Lemma 3 in 112711 and 
relates ||y - Mx'H^ to ||y - Mx'^'H^ and similar quantities at iteration f - 1. We remark that the constraint ^ < cr^ in 
Theorem 14.41 mav not be necessary and is added only for having a simpler derivation of the results in this theorem. 

Furthermore, this is a very mild condition compared to ^ -^1 ~ |fj ^1(1" ^le-p) and can only limit the range 

of values that can be used with the constant step size versions of the algorithms. 



Lemma 4.5. Consider the problem V and apply either AIHT or AHTP with a constant step size ji satisfying ^ > 
1 + 62{-p or an optimal step size fx - ^^^^^ . Then, at the t-th iteration, the following holds: 

||y - Mx'llJ - ||y - Mx'-'ll' < Q (||y - MxH^ - ||y - Mx'-'lg) (33) 
— - 1 j ||M(x - x'-')\\l + (Q - Dficrli ||y - Mx'-'H' . 



+ Ct 



fl(l - 62C-P 



^Remark that we even improve the condition of the idealized case in 12711 to be S2C-p < 5 instead of Sie-p < 5- 
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The proof of the above lemma appears in Appendix Appendix A The second lemma is built on the result of 



Lemma |43] It shows that once the objective ||y - Mx' 'H^ at iteration r - 1 is small enough, then we are guaranteed 

II l|2 

to have small ||y - Mx'y^ as well. Given the presence of noise, this is quite natural; one cannot expect it to approach 

but may expect it not to become worse. Moreover, the lemma also shows that if ||y - Mx'"' is not small, then the 
objective in iteration t is necessarily reduced by a constant factor. 

Lemma 4.6. Suppose that the same conditions ofTheorem \4.4\ holds true. ^||y - Mx'"' < if Hellj, then ||y - Mx'H^ < 
if- llellj. Furthermore, ;/||y - Mx'"'||^ > if Helll then 



where 

\2 



||y-Mx'||2<c4||y-Mx'-'||2 (34) 



C 



^4:=fl + -) l^T-^ --l)Q + (Q-l)(//cr^-l) + ^ < L 

\ ril ml -621-,,) I T 



Having the two lemmas above, the proof of the theorem is straightforward. 

Proof: [Proof of Theorem l4!4l When we initialize x'^ = 0, we have ||y - Mx^H^ - \\y\'^- Assuming that ||y||2 > ?7l|e||2 
and applying Lemma l4~6l repeatedlv. we obtain 



||y-Mx'||2<max(c;i|y||2,,72||g||2)_ 

Since \\y\^ < if Hellj for t > f , we have simply 

lly - Mx'll < 772 llell^ (35) 



for t >t* . Finally, we observe 



and, by the triangle inequality. 



||x-x'||2<-4 ||M(x-x')||' (36) 



||M(x-x')||2<||y- Mx'll, + ||e||2. (37) 
By plugging ( l35l l into (l37T i and then the resulted inequality into ( l36l l, the result of the Theorem follows. □ 

As we have seen, the above AIHT and AHTP results hold for the cases of using a constant or an optimal changing 
step size. The advantage of using an optimal one is that we do not need to find that satisfies the conditions of the 
theorem - the knowledge that such a p exists is enough. However, its disadvantage is the additional computational 
complexity it introduces. In Section[3]we have introduced a third option of using an approximated adaptive step size. 
In the next section we shall demonstrate this option in simulations, showing that it leads to the same reconstruction 
result as the optimal selection method. Note, however, that our theoretical guarantees do not cover this case. 



4.2. ACoSaMP Guarantees 

Having the results for AIHT and AHTP we turn to ACoSaMP and ASP. We start with a theorem for ACoSaMP. 
Its proof is based on the proof for CoSaMP in 

Theorem 4.7. Consider the problem f and apply ACoSaMP with a - Let C^ = max(Q, C2t-p) and suppose 
that there exists y > such that 

(1 ((177)2 -(Q-1)'^m))<1- (38) 
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Then, there exists SACoSaMp{Cg,cr^,y) > such that, whenever 64c-ip < SACoSaMpiCg,cr^,'y), the t-th iteration of the 
algorithm satisfies 



X - x' I < pip2 X - x'-' I + (77, +P1772) llelb , (39) 



where 



^ (1+ VQ) VI + ^3{^2p 

^ 1 A ' 

2^/ 1+ 63C-2,, (1 + S2e-,,)C2e-p (C2C-P - 1)(1 +y)o'M \ 
^ I r(l H-a) 7(1 +a)(l +7) (1 +a)(l +7)7 /' 

2^1+ 26A[-ip VQ + Q 



Moreover, p\p\ < 1, i.e., f/ie iterates converges. 

The constant 7 plays a similar role to the constant 77 of Theorem 14.41 It gives a tradeoff between satisfying the 
theorem conditions and the noise amplification. However, as opposed to 77, the conditions for the noiseless case are 
achieved when 7 tends to zero. An immediate corollary of the above theorem is the following. 

Corollary 4.8. Consider the problem f and apply ACoSaMP with a = If ( 138b holds and 6dr-3p < SAc«s„Mp{Cg,cr^,j), 
where and 7 are as in Theoretn \4. 7\ and S^coSaMpiCg, cr^, 7) is a constant guaranteed to be greater than zero when- 
ever ( I28I 1 is satisfied, then for any 

"logdixib/llelb) 



f > f* = 



log(l/pip2) 



l|x - Kc..,.p\\2 ^ 1 1 + \ +Pi'72)| llelb , (40) 



implying that ACoSaMP leads to a stable recovery. The constants rji, 772, Pi and p2 are the same as in Theorem \4.7\ 
Proof: By using ( l39l l and recursion we have that after f * iterations 

||X - xI'coS^pIL ^ (PIP2)'* ||X - X°c„SaND.||2 (41) 

+(1 +PlP2 + (piP2)^ + ---(PlP2)''"')(?7l +Pl'72)l|e||2. 

Since x^^^^ - 0, after f* iterations, one has 

(P1P2)'* ||x - X°,„,„,||2 < (PIP2)'* ||X||2 < ||e||2 . (42) 

By using the equation of geometric series with (HTt and plugging (l42l i into it, we get (|40] |. □ 

We turn now to prove the theorem. Instead of presenting the proof directly, we divide the proof into several 
lemmas. The first lemma gives a bound for ||x - x^H^ as a function of ||e||2 and ||Pa'(x - Xyj)!!^. 
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Lemma 4.9. Consider the problem V and apply ACoSaMP with a - ^y^. For each iteration we have 

M II 1 



The second lemma bounds x - x 



.fallen 



(43) 



ACoSiiMPl|2 



in terms of P^fCx - x^ 



L and ||e||2 using the first lemma. 



Lemma 4.10. Consider the problem V and apply ACoSaMP with a — For each iteration we have 

||x - x'll^ < pi ||Pa-(x - iip)\\^ + J]i ||e||2 , 



(44) 



where rji and pi are the same constants as in Theorem \4.7\ 
The last lemma bounds ||Paj(x - x "■^ - 

Lemma 4.11. Consider the problem V and apply ACoSaMP with a 



p'\\2 with ||x-x^,^,„„p||2 and ||e||2. 



C21-P < 



<(l+r)2-l' 



then there exists 6 , 



p{C2t-p, cr^, y) > such that for any 62f-p < 5,1 



• if 



ACoSaMP (C2(-p,o--^,y) 



Pa'(x ■ 



k < '72l|e||2 +P2 X- 



(45) 



(46) 



The constants 772 ^nd p2 are as defined in Theorem \4.7\ 

The proofs of Lemmas 14.91 14.101 and 14.111 appears in Appendices Appendix C Appendix D and Appendix E 
respectively. With the aid of the above three lemmas we turn to prove Theorem l4.7l 

froo/- [Proof of Theorem l4.7l Remark that since 1 + > we have that ^ implies - (Q - l)o-^ > 0. 
Because of that the condition in (l45T l in Lemma 14.111 holds. Substituting the inequality of Lemma 14.1 II into the 

^2^2 _ i+2g4r-3„Vc7+c, ^2 < ^ By noticing that 



inequality of Lemma |4. 1 01 gives ( |39] l. The iterates convergence if pjp^ 



p? < 1 it is enough to require 7-^^r^P? + ll < 1 . The last is equivalent to 



(1+Q) 



2\ 



1^^^ (1 - V^)' - iC2r-p - 1)(1 + S2c-p)crl, 

+ 264f-3p VQ - 1 + ^ie-ip < 0. 



(47) 



It is easy to verify that ((C,6) = j^TW ~ ^) " (C - 1)(1 + 5)o-^ is a decreasing function of both 6 and C for 



< 1 and 



< 5 < 1 and C > 1. Since 1 < C2e-p < Q and 5 > we have that (iC2e-p,S4e-2p) < ^(1,0) = 

C(C2t-p, 52c-p) > i^iCg, 54c-3p)- Thus we have that -( ^d^-^p- i{C2t-p, S2(-p)f < -^At-ip + 2 yjdu-ip- ^(Q, Su-Zp). 
Combining this with the fact that Cf < provides the following guarantee for pjp^ < 1 



-3p 



(l+Q)(l-54f-3p + 2V^ 
~ (1 -t-'^)2 " ^ ^^Ac-ip + 5«-3p) + (Cj - 1)(1 + 54f-3p)0"Mj + 254f_3,, -^^Cj - 1 + i 



(48) 



'AC-^p 



<0. 



Let us now assume that 64c-ip ^ This necessarily means that 6^, 



in the end. This assumption implies 



^4f-3 - \^'^t-^p- Using this and gathering coefficients, we now consider the condition 



(l+C.) 1 



Ci 



(i+r)2 



+ {C,-l)oU-\+2{\+Cs)\\ + 



Ci 



+ (1+Q) -1 



(i+r)2 



(l+y)2 



+ (Co - \)crli\ + 2 JCj + - 64c-ip < 0. 
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^f5AC- 



3p 



(49) 



The expression on the LHS is a quadratic function of -sj64e-3p. Note that since (l38T l holds the constant term in the 
quadratic function is negative. This guarantees the existence of a range of values [0, i5acosimp(C^, cr^, y)] for S4(-3p for 
which ( l49l l holds, where S^cvsmp^C^, cr^, y) is the square of the positive solution of the quadratic function. In case of 
two positive solutions we should take the smallest among them - in this case the coefficient of 64(-3p in ( |49] | will be 
positive. 

Looking back at the proof of the theorem, we observe that the value of the constant Sac<,smp{C^, cr^, y) can poten- 
tially be improved: at the beginning of the proof, we have used < 1. By the end, we obtained < < 0.25 since 
pi > 2. If we were to use this bound at the beginning, we would have obtained better constant <5acosimp(Cj, cr^' '-' 



4.3. ASP Guarantees 

Having the result of ACoSaMP we turn to derive a similar result for ASP. The technique for deriving a result for 
ASP based on the result of ACoSaMP is similar to the one we used to derive a result for AHTP from the result of 
AIHT. 

Theorem 4.12. Consider the problem V and apply ASP with a — If ( 1381 ) holds and 64(-ip < (5,,sp(C^, cr^, y), 

where and y are as in Theorem \4. 71 and 6,isp{C^, cr^, y) is a constant guaranteed to be greater than zero whenever 
(1381 ) is satisfied, then the t-th iteration of the algorithm satisfies 



MS7-II2 



1 +621-P II _J,I 
^ P1P2 ||x - x^,,, + 



1 + die- 
1 - 62e- 



■ (r]i +Pim) + 



1 - S2e- 



1 - (>2e-p 

and the iterates converges, i.e., p\p\ < 1- The constants rji, 772, pi and p2 are the same as in Theorem \4.7\ 
Proof: We first note that according to the selection rule of x^sp we have that 

||y - Mx;,,||, < ||y - MQa-xJj ■ 
Using the triangle inequality and the fact that y = Mx + e for both the LHS and the RHS we have 

||M(x - x[j\\^ - ||e||2 < ||M(x - QA'Xp)||2 + ||e||2 . 
Using the Jl-RIP property of M with the fact that x, Xasp and Qa^x^ are /-cosparse we have 

2 



(50) 



(51) 



ASPII2 



< I X - Qa'X,J 



1-6 



2C-P 



1-S- 



IC-p 



Noticing that Qa'X;, is the solution we get in one iteration of ACoSaMP with initialization of x^J, we can combine 
the above with the result of Theorem l4. 71 getting ( fSOl l. For jrj^PiP2 < 1 to hold we need that 



1 + 254c-ip yJC} + C( 



(1 - 64e-3p)~ 



21-p 



1 + y 



+ 1 



3;^ 



-\IC2i-p 



1 + y 



/ J 



< 1. 



(52) 



Remark that the above differs from what we have for ACoSaMP only in the denominator of the first element in the 
LHS. In AcoSaMP 1 - S'^f_^p appears instead of (1 - ^4^-3^)^. Thus, Using a similar process to the one in the proof of 
ACoSaMP we can show that ( l52b holds if the following holds 



(1 + Ce) 1- 



(1+r)- 



2 + (Q-l)tr^ -1+2(1+Q) 1 + 



(l+y)2 



^JS4C- 



3p 



(53) 



+ (l+Q) -1 



C 



(l+y)2 



^ + (Q - l)cr^ I + 2 + 2 1 64C-3P < 0. 
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Notice that the only difference of the above compared to ( |49l ) is that we have +2 instead of +0.5 in the coefficient of 
Sai-^p and this is due to the difference we mentioned before in the denominator in ( |52] |. The LHS of ( l53T l is a quadratic 
function of yj5nt--ip. As before, we notice that if (l38b holds then the constant term of the above is positive and thus 
(^aspCCj, cr^, y) > exists and is the square of the positive solution of the quadratic function. □ 

Having Theorem l4.12l we can immediately have the following corollary which is similar to the one we have for 
ACoSaMP. The proof resembles the one of Corollarv 14 . 8 1 and omitted. 

Corollary 4.13. Consider the problem V and apply ASP with a - ^7-^. If ( I38l l holds and d^f-ip < Sasp(C^, cr^, y), 
where and j are as in Theorem \4. 71 and 6^sp{C^, cr^, y) is a constant guaranteed to be greater than zero whenever 
I is satisfied, then for any 

' logdlxlb/llelb) 



f > t* 



l0g(l/S^PlP2) 



(i-(^PiP2y) /I +52.-, 



A\o ^ 1 + ttB , , im + pim) + — Ile|l2 • (54) 



'"^ \l-^2.-p-' ■ 1-62, 

implying that ASP leads to a stable recovery. The constants rji, ri2, pi and p2 are the same as in Theorem \4.7\ 

4.4. Non-Exact Cosparse Case 

In the above guarantees we have assumed that the signal x is ^-cosparse. In many cases, it is not exactly f-cosparse 
but only nearly so. Denote by x'^ = Q5j(x)X the best ^-cosparse approximation of x, we have the following theorem 
that provides us with a guarantee also for this case. 



Theorem 4.14. Consider a variation of problem V where x is a general vector, and apply either AIHT or AHTP 
both with either constant or changing step size; or ACoSaMP or ASP with a — ^^j^, and all are used with a zero 
initialization. Under the same conditions of Theorems 14. l\ and \4.2\ we have for any t > t* 

||x - XII2 < ||x - x^ll^ + c ||M(x - x^)||, + c llelb , (55) 

where t* and c are the constants from Theorems \4. l\ and \4.2\ 



Proof: First we notice that we can rewrite y - Mx'- + M(x - x^) + e. Denoting e^ = M(x - x'^) + e we can use 
Theorems 14 . II and 14 . 2 1 to recover x'^ and have 

||x^-x||2 < c||e''||2. (56) 
Using the triangle inequality for ||x - x||2 with the above gives 

||x - XII2 < ||x - x^ll^ + ||x^ - xjlj < ||x - x^ll^ + c . (57) 
Using again the triangle inequality for He'^Hj ^ I|e|l2 + ||M(x - x^)!!^ provides us with the desired result. □ 



4.5. Theorem Conditions 

Having the results of the theorems we ask ourselves whether their conditions are feasible. As we have seen in 
the introduction of this section we need Cf and C2c-p to be close to one for satisfying the conditions of the theorems. 
Using the thresholding in (l23T l for cosupport selection with a unitary satisfies the conditions in a trivial way since 
Ce - C2(-p - 1. This case coincides with the synthesis model for which we already have theoretical guarantees. For 
a general Q, the constants of the cosupport selection scheme in ( l23l l do not equal one and are not even expected to be 
close to one li25il . It is interesting to ask whether there exists an efficient general projection scheme that guarantees 
small constants for any given operator Q, or for specifically structured 0. We leave these questions as subject for 
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future work. Instead, we show empirically in the next section that a weaker projection scheme that does not fulfill all 
the requirements of the theorems leads to a good reconstruction result. This suggests that even in the absence of good 
near optimal projections we may still use the algorithms practically. 

The second condition of the theorems is that the RIP constants should be small. In the synthesis case, where 
£1 is unitary, it was shown for certain family of random matrices, such as matrices with Bernoulli or Subgaussian 
ensembles, that for any value of if m > -^klog(-^-) then 6k < ek H 0, Eltl ■ A similar result for the same family of 



random matrices holds for the analysis case. The result is a special case of the result presented in 112711 



Theorem 4.15 (Theorem 3.3 in li27|]). Let M e E'"^"' be a random matrix such that for any z e W' and <e < \ it 

satisfies 

P(|||Mz|i-||z|i|>ez2)<e-^, (58) 
where Cm > Q is a constant. For any value of ec > 0, ;/ 

m > (log(|L™™'^|) + {d-r) log(9/e,) + t) , (59) 

then < er with probability exceeding 1 — e^'. 

For completeness we present a proof of the theorem in Appendix Appendix F based on We include 



in it also the proof of Theorem 14. 161 to follow. In the case that Q is in general position IlJ:™"*! — < {-^Y^'' 
(inequality is by Stirling's formula) and thus m > {p - r) log(^). Since we want m to be smaller than d we need p-( 
to be smaller than d. This limits the size of p for since r cannot be greater than d. Thus, we present a variation of 
the theorem which states the results in terms of € instead of r. 



m> -^L-01og(— ^) + r), (60) 



Theorem 4.16. Under the same setup ofTheorem \4.15\ for any ec > if 

then 5f < ec with probability exceeding 1 — e^'. 

Remark that when is in general position E cannot be greater than d and thus p cannot be greater than 2d lfl6ll . 
For this reason, if we want to have large values for p we should allow linear dependencies between the rows of 0. 
In this case the cosparsity of the signal can be greater than d. This explains why linear dependencies are a favorable 



thing in analysis dictionaries ||23 1 



4.6. Comparison to Other Works 

Among the existing theoretical works that studied the performance of analysis algorithms 1 16, 2^ 24 1, the result 



that resembles ours is the result for -analysis in lfl9ll . This work analyzed the -analysis minimization problem with 
a synthesis perspective. The analysis dictionary was replaced with the conjugate of a synthesis dictionary D which 
is assumed to be a tight frame, resulting with the following minimization problem. 

xa-^, = argmin||D*z||i s.t. ||y-Mz||2<e. (61) 



It was shown that if x has a A;-sparse representation under D and M has the D-RIP 11 19112711 with djk < 0.6, an extension 
of the synthesis RIP, then 



||D*x - [D*x ],|ii 



pA-t, - x||2 < + \ (62) 



We say that a matrix M has a D-RIP with a constant 6k if for any signal z that has a A:-sparse representation under D 



(l-<5*)INl2<IIMz||2<(l+5,)||z||2. 
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(63) 



the authors in II19I1 presented this result as a synthesis result that allows linear dependencies in D at the cost of limiting 
the family of signals to be those for which ||D*x - [D*x]^^||i is small. However, having the analysis perspective, we 
can realize that they provided a recovery guarantee for -analysis under the new analysis model for the case that 
is a tight frame. An easy way to see it is to observe that for an /'-cosparse signal x, setting k = p - £, we have that 
||0x - [0*x]y,_f||j - and thus in the case e = we get that ( l62b guarantees the recovery of x by using dMT l with 
D* = 0. Thus, though the result in lfl9ll was presented as a reconstruction guarantees for the synthesis model, it is 
actually a guarantee for the analysis model. 

Our main difference from il9r is that the proof technique relies on the analysis model and not on the synthesis one 
and that the results presented here are for general operators and not only for tight frames. However, this is also the 
drawback of our approach since we require the existence of a near optimal projection. 

In the non-exact sparse case our results differ from the one in ( l62l i in the sense that it looks on the projection error 
and not on the values of 0x. It would be interesting to see if there is a connection between the two and whether one 
implies the other. 

5. Experiments 



In this section we repeat some of the experiments performed in 01611 for the noiseless case (e = 0) and some of the 
experiments performed in 1 21 1 for the noisy cas^ 



5.7. Targeted Cosparsity 

Just as in the synthesis counterpart of the proposed algorithms, where a target sparsity level k must be selected 
before running the algorithms, we have to choose the targeted cosparsity level which will dictate the projection steps. 
In the synthesis case it is known that it may be beneficial to over-estimate the sparsity k. Similarly in the analysis 
framework the question arises: In terms of recovery performance, does it help to under-estimate the cosparsity (1 A 
tentative positive answer comes from the following heuristic: Let A be a subset of the cosupport A^ of signal x with 
{ ■.-\K\ <l - |Ax|. According to Proposition 3 in ll^ 

K0{{) < I (64) 

is a sufficient condition to identify A in order to recover x from the relations y - Mx and 0^x = 0. K0(€) - 
max^gj^, dim^W^) is a function of L Therefore, we can replace £ with the smallest E that satisfies (l64b as the effective 
cosparsity in the algorithms. Since it is easier to identify a smaller cosupport set it is better to run the algorithm with 
the smallest possible value of (, in the absence of noise. In the presence of noise, larger values of { allows a better 
denoising. Note, that in some cases the smallest possible value of ( will be larger than the actual cosparsity of x. In 
this case we cannot replace { with t. 

We take two examples for selecting {. The first is for which is in general position and the second is for 
0DIF, the finite difference analysis operator that computes horizontal and vertical discrete derivatives of an image 
which is strongly connected to the total variation (TV) norm minimization. For that is in general position K0(€) - 
max(t/ - (, 0) lloQ. In this case we choose 

{ ^mm(d- -,{]. (65) 

For 0OIF we have A:0o,f (^) >d-j - ^ " 1 liSi and 

{ = rmin((-l/ V2 -H V2t/ - m - 1.5)^ 01- (66) 

Replacing £ with { is more relevant to AIHT and AHTP than ACoSaMP and ASP since in the last we intersect 
cosupport sets and thus the estimated cosupport set need to be large enough to avoid empty intersections. Thus, for 
in general position we use the true cosparsity level for ACoSaMP and ASP. For 0dif, where linear dependencies 
occur, the corank does not equal the cosparsity and we use { instead of £ since it will be favorable to run the algorithm 
targeting a cosparsity level in the middle. In this case { tends to be very large and it is more likely to have non-empty 
intersections . 



A matlab package with code for the experiments performed in this paper is in preparation for an open source distribution. 



19 




J 



(a) AIHT, constant step-size 



(b) AIHT, adaptive step-size 



(c) AHTP, constant step-size 



(d) AHTP. adaptive step-size 




(e) ACoSaMP, a = 



S 

(f) ACoSaMP a = 1 




(g) ASP, a='^ 



(h) ASP a = 1 



(i) A-^*! -minimization 




Figure 1: Recovery rate for a random tight frame with p = 144 and d = 120. From left to right, up to bottom: AIHT with a constant step-size, 
AIHT with an adaptive changing step-size, AHTP with a constant step-size, AHTP with an adaptive changing step-size, ACoSaMP with a ■■ 
ACoSaMP with a = 1, ASP with a = ASP with a=l, A-fi -minimization and GAP 



2[-p 



5.2. Phase Diagrams for Synthetic Signals in the Noiseless Case 

We begin with with synthetic signals in the noiseless case. We test the performance of AIHT with a constant step- 
size, AIHT with an adaptive changing step-size, AHTP with a constant step-size, AHTP with an adaptive changing 
step-size, ACoSaMP with a - ACoSaMP with a - I, ASP with a - =^ and ASP with a = 1. We compare the 
results to those of A- -minimization 1 18] and GAP lfl6ll . We use a random matrix M and a random tight frame with 



= 120 and p - 144, where each entry in the matrices is drawn independently from the Gaussian distribution. 



We draw a phase transition diagram 113311 for each of the algorithms. We test 20 different possible values of m and 
20 different values of I and for each pair repeat the experiment 50 times. In each experiment we check whether we 
have a perfect reconstruction. White cells in the diagram denotes a perfect reconstruction in all the experiments of 
the pair and black cells denotes total failure in the reconstruction. The values of m and I are selected according to the 
following formula: 

m - 6d E - d - pm, (67) 

where 5, the sampling rate, is the x-axis of the phase diagram and p, the ratio between the cosparsity of the signal and 
the number of measurements, is the y-axis. 

Figure [llpresents the reconstruction results of the algorithms. It should be observed that AIHT and AHTP have 
better performance using the adaptive step-size than using the constant step-size. The optimal step-size has similar 
reconstruction result like the adaptive one and thus not presented. For ACoSaMP and ASP we observe that it is better 
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AIHT, adaptive step-size 



AHTP, adaptive step-size 



ACoSaMP, a = 1 



ASP, a = 1 




Pi " " " " "S " " " " ' " " " " "S ' " " "(5 " "" " ' " " " " "S " 



> 




Figure 2: Recovery rate for a random tight frame with p = 240 and d = 120 (up) and a finite difference operator (bottom). From left to right: AIHT 
and AHTP with an adaptive changing step-size, and ACoSaMP and ASP with a = I. 

to use fl = 1 instead of a - Compared to each other we see that ACoSaMP and ASP achieve better recovery than 
AHTP and AIHT. Between the last two, AHTP is better Though AIHT has inferior behavior, we should mention that 
with regards to running time AIHT is the most efficient. Afterwards we have AHTP and then ACoSaMP and ASP. 
Compared to ii and GAP we observe that ACoSaMP and ASP have competitive results. 

With the above observations, we turn to test operators with higher redundancy and see the effect of linear de- 
pendencies in them. We test two operators. The first is a random tight frame as before but with redundancy factor 
of 2. The second is the two dimensional finite difference operator which is known also as the two-dimensional total 
variation operator (2D-TV). In Fig.|2]we present the phase diagrams for both operators using AIHT with an adaptive 
changing step-size, AHTP with an adaptive changing step-size, ACoSaMP with a - I, and ASP with a = 1. As 
observed before, also in this case the ACoSaMP and ASP outperform AIHT and AHTP in both cases and AHTP 
outperform AIHT. We mention again that the better performance comes at the cost of higher complexity. In addition, 
as we expected, having redundancies in results with a better recovery. 



5.3. Reconstruction of High Dimensional Images in the Noisy Case 

We turn now to test the methods for high dimensional signals. We use RASP and RACoSaMP for the reconstruc- 
tion of the Shepp-Logan phantom from few number of measurements. The sampling operator is a two dimensional 
fourier transform that measures only a certain number of radial lines from the fourier transform. The cosparse oper- 
ator is the 2D-TV. The phantom image is presented in Fig. |3(a)| Using the RACoSaMP and RASP we get a perfect 
reconstruction using only 15 radial lines, i.e., only m - 3782 measurements out of d - 65536 which is less then 
6 percent of the data in the original image. The algorithms requires less than 20 iterations for having this perfect 
recovery. For AIHT and RAHTP we achieve a reconstruction which is only close to the original image using 35 
radial Unes. The reconstruction result of AIHT is presented in Fig |3(b)| The advantage of the AIHT, though it has 
an inferior performance, over the other methods is its running time. While the others need several minutes for each 
reconstruction, for the AIHT it takes only few seconds to achieve a visually reasonable result. 

Exploring the noisy case, we perform a reconstruction using RASP of a noisy measurement of the phantom with 
22 radial lines and signal to noise ratio (SNR) of 20. Figure 3(c)| presents the noisy image, the result of applying 
inverse fourier transform on the measurements, and Fig. |3(d) presents its reconstruction result. Note that for the 
minimization process we solve conjugate gradients, in each iteration and take only the real part of the result and crop 
the values of the resulted image to be in the range of [0, 1]. We get a peak SNR (PSNR) of 36dB. We get similar 
results using RACoSaMP but using more radial lines (25). 
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(a) Phantom (b) AIHT - noiseless (c) Noisy Phantom (d) RASP - noisy 



Figure 3: From left to right: Shepp Logan phantom image, AIHT reconstruction using 35 radial lines, noisy image with SNR of 20 and recovered 
image using RASP and only 22 radial lines. Note that for the noiseless case RASP and RACoSaMP get a perfect reconstruction using only 15 
radial lines. 

6. Discussion and Conclusion 

In this work we presented new pursuits for the cosparse analysis model. A theoretical study of these algorithms 
was performed giving guarantees for stable recovery under the assumptions of the 0-RIP and the existence of a 
near optimal projection. In addition, we showed experimentally that using simpler kind of projections is possible in 
order to get good reconstruction results. We demonstrated both in the theoretical and the empirical results that linear 
dependencies within the analysis dictionary are favorable and enhance the recovery performance. 

We are aware that there are still some open questions in this work and we leave them for future research. This 
should deal with following: 

• Our work assumed the existence of a procedure that finds a cosupport that implies a near optimal projection 
with a constant Cf . An important question that raises from this assumption is: for which types of and values 
of Ci we can find an efficient procedure that implies a near optimal projection. 

• As we have seen in the simulations, the thresholding procedure, though not near optimal with the theorems 
required constants, provides good reconstruction results. A theoretical study of the analysis greedy-Uke tech- 
niques with this cosupport selection scheme is required. 

• A family of analysis dictionaries that deserves a special attention is the family of tight frame operators. In 
synthesis, there is a parallel between the guarantees of -synthesis and the greedy like algorithms. The fact 
that a guarantee with a tight frame exists for /'i -analysis encourage as to believe that similar guarantees exist 
also for the analysis greedy-like techniques. 

• In this paper, the noise e was considered to be adversarial. Random white Gaussian case was considered for 
the synthesis case in 1 141] resulting with near-oracle performance guarantees. It would be interesting to verify 
whether this is also the case for the analysis framework. 



Appendix A. Proof of Lemma |431 

Lemma \4.5\ Consider the problem P and apply either AIHT or AHTP with a constant step size // satisfying 
^ > 1 + 621- p or an optimal step size ji - j^^^- Then, at the f-th iteration, the following holds: 

||y - Mx'll' - ||y - Mx'-'ll' < C[ (||y - MxH^ - ||y - Mx'-'H^) 

+ Q [ ^^i^s^^ ) - 1) l|M(x - + {Cc - \Wm ||y - Mx'- 
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Proof: We consider the AIHT algorithm first. We take similar steps to those taken in the proof of Lemma 3 in 
1 131. Since ^ > 1 + S2c-p, we have, from the 0-RIP property of M, 

2 . 1 lU, .,-l||2 



||M(x' - < - ||x' - X' 



l2 • 



Thus, 



I - ||y - Mx'^'ll^ = -2(M(x' - x'"'),y - Mx'"') + ||M(x' - x'"')!!^ 



< -2<M(x' - x'"'),y - Mx'"'> + - llx' - x'^'ll^ 

^ 112 



-2<x' - x'"', M*(y - Mx'"')> + - llx' - x'"'||^ 

H "2 



= -^l ||M'(y - Mx'-')||^ + - ||x' - x'-' - ^lW(y - . 
Note that by definition, = Qj^(x'~^ + ^M*(y - Mx'"^). Hence, by the C^-near optimahty of the projection, we get 
||y - Mx'll^ - ||y - Mx'-' < -ju ||M*(y - Mx'-' )|g + ^ ||x - x'-' - //M*(y - Mx'-')!!^ . (A. 1) 



Now note that 

||x-x'"' -;uM*(y-Mx'"'^ll^ 



llx - x'-'ll^ - 2;U<M(x - x'"'), y - Mx'"') + ||M*(y - Mx'"'^"^ 



2 



= IN^ - + (lly - Mxll^ - ||y - Mx'-' - ||M(x - x'-')^) 

+ //2||M*(y-Mx'-'^ll^ 



I2 ■ 

Putting this into dA.ll l. we obtain the desired result for the AIHT algorithm. 

We can check that the same holds true for the AHTP algorithm as follows: suppose that xj^^^p is the (f - l)-st 
estimate from the AHTP algorithm. If we now initialize the AIHT algorithm with this estimate and obtain the next 
estimate x^.^^, then the inequality of the lemma holds true with x^-^^ and xJ^7,J,p in place of x' and x'"' respectively. On 
the other hand, from the algorithm description, we know that the f-th estimate xj^^^p of the AHTP satisfies 

||y - mxIhtpIIz ^ l|y - mx;,,,,]!' . 

This means that the result holds for the AHTP algorithm as well. 

Using a similar argument for the optimal changing step size we note that it selects the cosupport that minimizes 

||Mx - Mx'll^. Thus, for AIHT and AHTP we have that ||Mx - Mx^^JI^ < ||Mx - MxJ,!!^ for any value of yU, where 
xj,pj and x^ are the recovery results of AIHT or AHTP with an optimal changing step-size and a constant step-size 
respectively. This yields that any theoretical result for a constant step-size selection with a constant ju holds true also 
to the optimal changing-step size selection. In particular this is true also for // - — . This choice is justified in the 
proof of Lemma l431 □ 



Appendix B. Proof of Lemma I4l6l 

|2 



Lemma R6r Suppose that the same conditions of Theorem 14.41 holds true. If ||y-Mx < ?/ ||e||2, then 
2 < 77^ ||e||2. Furthermore, if ||y - Mx'^'H^ 

Mx'IIj < C4 ||y - Mx' 
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||y - Mx'llj < 772 ||e||2. Furthermore, if ||y - Mx'^'H^ > if Hellj, then 

|2 . II --.''-111^ 



where 

C4 



Proof: First, suppose that ||y - Mx'^'H^ > jf Weill- From Lerruna l431 we have 



^ „2 |U||2 ^ ^^^^^ ..av. 

Ily-Mx'll^ < Cf||y-Mx||2 + (Q-l)0u(r^-l)||y-Mx'-'||2 (B.l) 



Remark that all the coefficients in the above are positive because 1 + 62c-p ^ j; ^ Q > 1 . Since y - Mx = e, 

we note 

||y-Mx||2< l||y-Mx'-'||2 

and, by the triangle inequality. 



||M(x - x'-% < lly - Mx||2 + ||y - Mx'-'H^ < (l + ^) ||y - 

Therefore, from ( IB. lb . 



||y-Mx'||2<c4||y-Mx'-'||2. 



This is the second part of the lemma. 

Now, suppose that ||y - Mx'"'||^ < rf^ Hellj. This time we have 



||M(x - x'-')||, < lly - Mxll, + ||y - Mx'-'||, < (1 + //) Hdb . 
Applying this to (IB. lb . we obtain 

||y - Mx'll^ < C[ mil + (Cc - mcrli - l)/?' ||e|i + Q (j^^^^^^y - 1 j d + lle|i 

Q + (Q - Dijjcrl - 1)772 + Q - l| (1 + 7,)2j ||e||2 = C477' ||e||2 . 

Thus, the proof is complete as soon as we show q < 1, or C4 - 1 < 0. 

To see C4 - 1 < 0, we first note that it is equivalent to — all the subscripts are dropped from here on for simplicity 
of notation — 

1 2(1-5)1 (C - 1)0-2(1 - 5) 



<0, 



or 

\-2(l-6)bi- + (l-6fb2<0. 
Solving this quadratic equation in ^, we want 

{l-6){b, - ^b^^-b2)<^<(l-6){b,+ ^b\ - ^2) . 

Such // exists only when |f < 1 • Furthermore, we have already assumed 1 +5 < ^ and we know (\-5)[b\ - -^^j - ^ij < 
1+5, and hence the condition we require is 



l+5<- <{\-5){b, + ^b\-b^. 
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which is what we desired to prove. 

As we have seen in Lenima l431 for changing optimal step-size selection ( l34l l holds for any value of yu that satisfies 
the above conditions. Thus, we will select the one that minimizes C4 and have 

- = V^(l-52f-p). (B.2) 
M 

Since we need ^ > 1 + 62C-P and have that V^(l - dic-p) < bi{\ — 62c-p) < 1 + 52e-p we set ^ = 1 + S2[-p- □ 



Appendix C. Proof of Lemma I4l9l 

Lemma \4~9\ Consider the problem !P and apply ACoSaMP with a - ^y^. For each iteration we have 



1 II II VI + ^3f- 



2p 



Proof: Since Xp is the minimizer of ||y - MvHj with the constraint £l/^,\ - 0, then 

<Mx,,-y,Mu> = 0, (C.l) 
for any vector u such that Q^iU = 0. Substituting y = Mx + e and moving terms from the LHS to the RHS gives 

{Xp - X, M*Mu) = (e, Mu), (C.2) 

where u is a vector satisfying £l/;^,u - 0. Turning to look at ||Qa/(x - Xp)\\^ and using iC.2\ with u - Qa'(x - Xp), we 
have 

||Qa,(x - x,,)!!' = <x - x^, Q^,(x - x,,)> (C.3) 

= <x - x,„ (I - M*M)Qa,(x - X/,)) - <e, MQa,(x - x^)) 

< ||x - xp\l ||QA^nA'(I - M*M)Qa4 ||Qa,(x - xp)\\, + ||e||2 ||MQa,(x - x,,)!!^ 



< 64e-3p \\x - x^ll^ ||Qa,(x - Xp)\\^ + ||e||2 yjl + ^3^-2^ ||Qa'(x - Xp)\\ 



where the first inequality follows from the Cauchy-Schwartz inequality, the projection property that Q^, - Q^iQa/ 
and the fact that x - x,, = QA^nA'(x ^ Xp). The last inequality is due to the Q-RIP properties, Corollarv 12.51 and that 
according to Table[I]|A'| >3£-2p and |Ax n A'| > 4{ - 3p. After simplification of (IC.3I) by ||Qa'(x - Xp)\\^ we have 



||Qa'(x - Xp)\\^ < 64(-3p \\x - Xp\\^ + +63,e-2p\\e\\2 ■ 



Utilizing the last inequahty with the fact that ||x - Xp||~ - ||Pa'(x - '>^p)fL + ||Qa'(x - Xp)!!^ gives 



2 • 

2 



,{x-Xp)\\l + {64e -3p ||x ^/TT^^IIelb) . (C.4) 



By moving all terms to the LHS we get a quadratic function of ||x - Xp||,. Thus, ||x - x^H^ is bounded from above by 
the larger root of that function; this with a few simple algebraic steps gives the inequality in ( l43T l. □ 
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Appendix D. Proof of Lemma 141101 

Lemma \4.10\ Consider the problem !P and apply ACoSaMP with a - ^^j^- For each iteration we have 

||x - x'll^ < pi ||Pa-(x - Xi,)\\^ + 771 Hell, , 



where rji and pi are the same constants as in Theorem l4.7l 
Proof: We start with the following observation 

||x - x'llj = ||x -Xp + Xp- x'Wl = ||x - Xp\\l + \\x' - Xp\\l + 2(x - Xp)*(X;, - x'), (D.l) 

and turn to bound the second and last terms in the RHS. For the second term, using the fact that x' = Q-sdx ^i*-^ 
(I2TI 1 gives 

||x'-xJ'<Q||x-xJ2- (D.2) 

For bounding the last term, we look at its absolute value and use ( IC.2b with u = x,, - x' = Qj{i{Xp - x')- This leads to 

|(x - XpYiXp - x')| = |(x - x,,)*(I - M*M)(x,, - x') - e*M(x,, - x')| . 

By using the triangle and Cauchy-Schwartz inequalities with the fact that x - x,, = Q^^ nA'('^ ~ '^p) ™d x^ - x' = 
Q/^t{Xp - x') we have 

|(x - XpTiXp - x')\ < \\x - xpl_ llQ^^nA'd - M*M)Qa4 ||x,, - x'H^ + Helb \\M(Xp - x% (D.3) 
< 64e-3p \\x - Xp\\^ \\xp - x'W^ + + 63f-2p\\e\\2 \\xp - x'\\^ , 

where the last inequality is due to the Q-RIP definition and Corollary [23] 
By substituting (ID. 21 ) and ( ID.3b into dP.lb we have 

||x-x'||2 < (1 +Cf)||x-Xp||2 +254f_3,, a/QIIx-XpIIj + yjl + S^t-lp VQ llclb ||x - X^H^ 

< ((1 + 26m-3p ^jc'c + C[) \\x - x,4 + 2 ^(1 + 6i(-2p)Ce W^h ) \\x - Xp 



Now, combining the inequality of Lemma 14791 as a first step and using the fact that 64e-3p < 1 with a few algebraic 
steps as a second step gives 

II ^,||2 1 + 254f_3p VQ + Q ||_ , ,||2 ., 

Il'^ - II2 ^ IIPa'^'^ - ""pA (D.4) 

^ "4{-3p 

VTT^^d + VQ)' |, ^ ,| „ „ (l+63(-2p)(l + VQ)' „ „2 

||PA-(X-Xp)||2l|e||2' 



Pa,(x-x,,)^ + — ^ -^llelb 

/l -52 l-S4t-3p 
■\J ^ "4{-3p 

Taking square-root on both sides provides the desired result. □ 
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Appendix E. Proof of Lemma 14111] 

Lemma \4.11\ Consider the problem V and apply ACoSaMP with a - ^y^- if 

then there exists ~5 AcosaMp(C2f-,„ cr^, r) > such that for any 5ic-p < 6 ac.smp{C2(-i„ o"^, r) 

||Pa,(x - x„)\\^ < 772 ||e||2 + P2 ||x - x'-i . 



The constants 772 and p2 are as defined in Theorem l4.7l 

In the proof of the lemma we use the following Proposition. 

Proposition Appendix E.l. For any two given vectors Xi, X2 and any constant c > it holds that 

llxi + X2||^ < (1 + c) ||xi||2 + (1 + IIX2II' (E.l) 

The proof of the proposition is immediate using the inequality of arithmetic and geometric means. We turn to the 
proof of the lemma. 

Proof: Looking at the step of finding new cosupport elements one can observe that Qa^ is a near optimal projection 
for M*y'"' = M*(y - Mx'"') with a constant C2t-p- The fact that |A'"' n A| >2{ - p combined with (EB gives 



||(I - QAjM*(y - Mx'-')||2 < C21-P ||(I - QA,-.nA)M*(y - Mx'-')!!^ . 
Using simple projection properties and the fact that A' c Aa with z - M*(y - Mx' ') we have 

IIQa-HI' ^ I|Qa.z||J = \\z\\l - ||(I - QaJzIIJ > \\z\\l - C2/-„ 11(1 - QA'-.nA)z|l2 (E.2) 

= M\l - C21-P {\Ml - IIQA'-.nAzli) = C21-P IIQa-hazIIz - (C21-P - 1) \Ml . 

We turn to bound the LHS of ( IE.2b from above. Noticing that y = Mx + e and using Proposition [Appendix E. 1 1 
with a constant yi > gives 



||QA,M*(y - Mx'-')||2 ^ ( 1 + ^ ) l|QA'M*e||2 + (1 + n) ||Qa'M-M(x - x'-i)!!^ . (E.3) 



Using Proposition Appendix E. 1 again, now with a constant o- > 0, we have 

||Qa,M*M(x - x'-')\\l <{l+a) ||Qa,(x - x'-')||' + (l + -] ||Qa,(I - M*M)(x - x'-')||' (E.4) 



< (1 + ff) ||x - x'-'ll' -il+a) ||Pa,(x - x'-')\\l + (1 + ^) ||Qa'(I - M*M)(x - x'-')\\l . 
Putting ( IE.4I 1 into ( IE. 31 ) and using ( fTSI ) and Corollarv 12.21 gives 

||QA,M*(y - Mx'-% < (l+y')(^+^3/-2;,) ||^||2 _ (J ^ ^ ||p^ _ 5^ 

+ |l + a + 64i-ip + -^^j(l +Ti)||x-x'"'||2- 

We continue with bounding the RHS of (IE. 21 ) from below. For the first element of the RHS we use an altered 
version of Proposition Appendix E. 1 with a constant 72 > and have 



||QA'-.nAM*(y - Mx'-')||' > ||QA,-.nAM*M(x - x')]]' - - IIQA'-'nAM*e||^ . (E.6) 

1+72 72 
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Using the altered form again, for the second element in the RHS, with a constant > gives 



||Q^,-,nAM*M(x - x')||2 ^Y^h-^'^fi-^ ||QA,-.nA(M*M - I)(x - x'-')^ . 



I|2 1 



Putting (IE. 7b in ( IE.6b and using the RIP properties and ( fTSl l provide 



lA'-'nA 



M*(y - Mx' 



1 



621- 



1 



l2^\l+yS /3 / I +72"'' "2 



,2 (l+(52;-„) 



72 



(E.7) 



(E.8) 



Using Proposition Appendix E. 1 with a constant 73 > 0, (|9]l and some basic algebraic steps we have for the second 

(E.9) 



element in the RHS of ( IE2I 1 

||M*(y-Mx'-i)||2 < (l+r3)||M-M(x-x'-i)||^ + (l + lJ||M*e||i 

1 



< (1 + r3)(l + S2i-p)cri, (X - x'-') L + 1 + - Urjj ||e||^ 



By combining (lE.SI l, ( IE.8I 1 and ( |E.9b with ( IE. 2b we have 



(l+a)(l+ri)||PA,(x-x' 

1 



'-Kl|2 ^ (1 +ri)(l +53/-2p) ,, ,,2 



e|l2 + C21- 



73 



(1 + ^2Z-p) ,, ,,2 



72 



(E.IO) 



+(C2i-p - 1) ( 1 + — I cr^ ||e||^ + \l+a + + 



a 



(1 x-x 



'-11 



+{C2l-p - 1)(1 + r3)(l + S2l-p)crlf ||(X - X'-')||2 - C2,-p 

Dividing both sides by (1 + a)(l and gathering coefficients give 



1 



S21-P 



1+/3 y6 ; 1 + 72 



X - X 



||Pa,(x-x'"')ii: < 



2 ^ / 1 + S3i-2p (1 + ^2/-p)C2;-p (C2/-P - 1)(1 +y3)o -f 
72(1 +71) (1 +a)(l +71)73 

(l + ^ 

\ a 



M 



(E.ll) 



S4l-3p {C2I-P - 1)(1 + 73)(1 + 52/-p)cr^ 



(1 +a)(l +71) 



C2I- 



(l+ff)(l+7i)(l+72)\l+/? 

II 1 1'^ 

The smaller the coefficient of ||x - x'"'||~, the better convergence guarantee we obtain. Thus, we choose (3 



and a - 



"21-, 



I (i+ri)(i+72)^ 



1 'C2/-„-l)(l+r3H't^'2/-pl°-K 



SO that the coefficient is minimized. This yields 



Pa,(x-x' 



/-K||2 / 1 + ^3l-2p (1 + 52l-p)C2l-p {C2I-P - 1)(1 + 73)0-5 



/ i + 03/-2j 
\7i(l + a 



M ' 



7i{l+a) 72(1 +a')(l +71) (1 +Q')(1 +71)73 



3p ■ 



C21-P r-. n2 (C2/-p-l)(l+73)(l+52;-p)cr^ 



(1 +7i)(l +72) 



1 + 79) ^ ' 



1 +71 



X - X 



The values of 71,72,73 provide a tradeoff between the convergence rate and the size of the noise coefficient. For 
smaller values we get better convergence rate but higher amplification of the noise. We make no optimization on their 
values and choose them to be 71 = 72 = 73 = 7 where 7 for an appropriate 7 > 0. Thus we have 



l|PA'(x-x'-')|g< 



1 + 631-2P (1 + S2l-p)C2l-p {C2I-P - 1)(1 + y)(Tl 
H ■ H 

7(1 + a) 7(1 + a)(l + 7) 



(l+a)(l+7)7 / 



(E.12) 



'4/-3p ■ 



/ C2I-P 

'(1+7)' 



(1 - ^/^,f - (C21-P -m+ S2,-p)^M ) I 



X - X 



28 



Since P/^iXp — P^jx' ' = the above inequality holds also for ||P^,(x-x' Inequality ( |46] | follows since the 
right-hand side of dE. 12b is smaller than the square of the right-hand side of ( l46l l. 

Before ending the proof, we notice that p2, the coefficient of ||x - x'"'||t is defined only when 

(C21-P - 1)(1 + d2i-p)crli < (1 - V^)' ■ (E.13) 

First we notice that since 1 + 621-p ^ (l ^ s/^ii-p) a necessary condition for ( IE.13I I to hold is {C21-P - 1)o"m < (i+y^ 
which is equivalent to ( |45] ). By moving the terms in the RHS to the LHS we get a quadratic function of ^jS2(-p- 
The condition in (l45T l guarantees that its constant term is smaller than zero and thus there exists a positive S21-P 
for which the function is smaller than zero. Therefore, for any 62c-p < ^ACosiMpCCif-p, o"^, y) ( IE. 13b holds, where 
^ACosiMpCCaf-p, cr^, y) > is the square of the positive solution of the quadratic function. 

□ 



Appendix F. Proofs of Theorem 14. 151 and Theorem l4.16l 



Theorem WJ5\ (Theorem 3.3 in t'Al ): Let M € M'' 

Q<~e<\ 

p(|||Mz||5-||z||? 

where Cm > is a constant. For any value of e, > 0, if 

32 



be a random matrix that satisfies that for any z € R'' and 



> ezj) < , 



m > (log(|L™™''|) + (d-r) log(9/e,) + f) , 



then 5^^°™^ < e,. with probability exceeding I - e ' . 

Theorem \4.16\ Under the same setup of Theorem l4.15l for any > if 



m > 



32 



{p-{)\og 



9p 



(p - €)ec 



+ t 



then 6( < ee with probability exceeding I - e ' . 

Proof: Let e = e,/4, B''-' = {z e R''-'', ||z||2 < 1) and an e-net for B''-' with size |^| < (l + ff'' HI. For 
any subspace = "Wa n B^^'' such that A e L™""'' we can build an orthogonal matrix Uji e M''^*''"''' such that 
nV* = {UaZ, z e W'-'} = VaB''-'. It is easy to see that *Pa = Va^"'-' is an e-net for and that ^P^^k = U 
is an 6-net for y[=°i-™k p, gd^ ^j^gj.^ Ivj, 



< L' 



corank 



(1 



We could stop here and use directly Theorem 2.1 from 13 ill to get the desired result for Theorem l4.15l However, 
we present the remaining of the proof using a proof technique from ll32i [7ll. Using union bound and the properties of 



M we have that with probability exceeding 1 - L' 



corank 



(1 + h'-'e 



every v € ^^.orank satisfies 



(l-e)||v||5<||Mv||^ <(l+e)||v||5. 



(F.l) 



According to the definition of Sf™^ it holds that Vl + 5™™'' - sup^g^^co^k^ig,, ||Mv||2. Since Jl''°™^nB'' is a compact 
set there exists vo e n B'' that achieves the supremum. Denoting by v its closest vector in *P^corank and using 

the definition of ^I*^ corunk We have ||vo - v||2 < e. This yields 



1 + 6';" 



IIMV0II2 < ||Mv||2 + ||M(vo-v)||2 

Vo - V 



(F.2) 



vr 



e + 



M- 



llvo - VII2 
29 



llvo- 



< vr 



+ e + 



7 



1+6"; 



:corank~ 



The first inequality is due to the triangle inequality; the second one follows from (IF. II ) and arithmetics; and the last 

inequality follows from the definition of df""^, the properties of e-net and the fact that || ||vo°-v'||, [[^ ~ ^- Reordering 
(IF.2I 1 gives 

^ ^ ^corank < JjLL < 1 + 4e = 1 + (F.3) 

(1 - ey 

where the inequality holds because < 0.5 and e = j < |. Since we want ( IF.3b to hold with probability greater than 

1 - e^' it remains to require |L™™''| (1 + jY^'^e < e^' . Using the fact that (1 + |-) > j and some arithmetics we 
get (|59T l and this completes the proof of the theorem. 

We turn now to the proof of Theorem l4.16l Its proof is almost identical to the previous proof but with the diff'erence 
that instead of r, L™""'' and JqqJ^ ^jj^j jjj jj^jg ^.^gg ^g jjq^ Jqiqw what is the dimension of 

the subspace that each cosupport implies. However, we can have a lower bound on it using p - (. Therefore, we use 
BP^^ instead of B''^''. This change provides us with a condition similar to ( |59] | but with p - C in the second coeflicient 
instead of d - r. By using some arithmetics, noticing that the size of Lf is {^^ and using Stirling's formula for upper 
bounding it we get (l60t and this completes the proof. 
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